Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use containerd restart manager to monitor services #3168

Closed
wants to merge 1 commit into from

Conversation

kkroo
Copy link
Contributor

@kkroo kkroo commented Aug 26, 2018

- What I did
Adds system service uptime monitor that restarts stopped system services periodically (default is 10 seconds)

- How I did it
Integrate containerd containerd/containerd#2318
Requires CONTAINERD_COMMIT v1.2.0

- How to verify it

(ns: sshd) linuxkit-08002771aa74:/var/log# ctr -n services.linuxkit c info sshd
{
    "ID": "sshd",
    "Labels": {
        "containerd.io/restart.logpath": "/var/log/sshd.restart.log",
        "containerd.io/restart.status": "running"
    },
...

- Description for the changelog

- A picture of a cute animal (not mandatory but encouraged)

@GordonTheTurtle
Copy link
Collaborator

Please sign your commits following these rules:
https://github.com/moby/moby/blob/master/CONTRIBUTING.md#sign-your-work
The easiest way to do this is to amend the last commit:

$ git clone -b "restart_monitor" git@github.com:kkroo/linuxkit.git somewhere
$ cd somewhere
$ git commit --amend -s --no-edit
$ git push -f

Amending updates the existing PR. You DO NOT need to open a new one.

@justincormack
Copy link
Member

Hi, we can't merge PRs unless they are signed-off-by as per instructions above.

I think restart policy should probably be configurable.

Copy link
Contributor

@ijc ijc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is using the provided interfaces correctly, I can't seem to find any other examples (there seem to be none in the containerd tree, e.g. in ctr), but it certainly looks suspiciously racy to me as is.

pkg/init/cmd/service/cmd.go Outdated Show resolved Hide resolved
pkg/init/cmd/service/cmd.go Outdated Show resolved Hide resolved
pkg/init/cmd/service/cmd.go Outdated Show resolved Hide resolved
@ijc
Copy link
Contributor

ijc commented Aug 30, 2018

containerd/containerd#2318 shows it using container.Update to set it after starting, also using WithNoRestart on teardown. However I'm not completely convinced this is intended for this use case. From containerd/containerd#2318 (comment):

The usecase is for system level containers, it will allow remote snapshotters, runtimes, and such to be kept running without any external deps on a system. Its not built for the kube or docker restart policies, just a simple internal function to keep low level system containers running, maybe something like kubelet in a container ;)

It seems like it is for containerd internal functionality not 3rd party use (as much as some linuxkit containers could be considered "system level containers").

@kkroo
Copy link
Contributor Author

kkroo commented Nov 27, 2018

Addressing comments regarding race conditions. Regarding the use of this interface, it is a plugin with public interfaces and seems very well suited for improving the availability of linuxkit's system services.

@medic15
Copy link

medic15 commented Nov 27, 2018

This needs to be configurable for each service. Not all those who wander are lost. Not all services that stop are broken.

@kkroo
Copy link
Contributor Author

kkroo commented Nov 28, 2018

This needs to be configurable for each service. Not all those who wander are lost. Not all services that stop are broken.

Long running services are typically ones where any exit code would be anomalous. Yes, we need to continue to support one off execution. This change would obviously change that and restart each service that has exited. One off executables are still supported and probably better served under the onboot section of the runtime definition where there is reliable execution ordering for dependencies. This would provide reliable long running services to Linuxkit without the need to restart the whole system. Besides, I think it would be interesting to have a service that exits as part of its nominal behavior be monitored by this change to implement watchdogs or custom polling functionality.

@ijc
Copy link
Contributor

ijc commented Nov 28, 2018

@medic15 and @justincormack are correct, this needs to be configurable per service.

I still have concerns regarding this use of a plugin which is "The usecase is for system level containers" and "Its not built for the kube or docker restart policies, just a simple internal function to keep low level system containers running". Whether or not it happens to currently meet the use case here if we are using it outside its "support envelope" we have no guarantee that it will continue to be suitable in the future. I'd be happy to be overruled by the other maintainers on that though.

@justincormack
Copy link
Member

At present a few of the services we run may terminate (eg rngd if no rng is supported) but this could be changed.

@medic15
Copy link

medic15 commented Nov 28, 2018

My use case involves populating a Redis in-memory database using data from several disparate sources. This can't be done in the onboot phase since my init service and Redis cannot be running at the same time.

@justincormack
Copy link
Member

@medic15 yes that seems a reasonable use case where it should be configurable.

Signed-off-by: Omar Ramadan <omar.ramadan93@gmail.com>
@kkroo
Copy link
Contributor Author

kkroo commented Nov 28, 2018

@medic15 @justincormack I still don't understand the use case and why such a one-off can't be added to the end of the onboot with some bash script that would wait for redis to come up. Hopefully you can help clarify.

In any case, I've added a boolean flag noRestart to the runtime configuration section than can disable the restart. PTAL!

@justincormack
Copy link
Member

Services are not started until all onboot containers have exited.

@rn
Copy link
Member

rn commented Dec 19, 2019

closing this as there was no activity for over a year and some changes were requested.

@rn rn closed this Dec 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants